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Abstract. When applying a statistical method in practice it often oc- 
curs that some observations deviate from the usual assumptions. How- 
ever, many classical methods are sensitive to outliers. The goal of robust 
statistics is to develop methods that are robust against the possibility 
that one or several unannounced outliers may occur anywhere in the 
data. These methods then allow to detect outlying observations by their 
residuals from a robust fit. We focus on high-breakdown methods, which 
can deal with a substantial fraction of outliers in the data. We give 
an overview of recent high-breakdown robust methods for multivariate 
settings such as covariance estimation, multiple and multivariate re- 
gression, discriminant analysis, principal components and multivariate 
calibration. 

Key words and phrases: Breakdown value, influence function, multi- 
variate statistics, outliers, partial least squares, principal components, 
regression, robustness. 



1. INTRODUCTION 

Many multivariate datasets contain outliers, that 
is, data points that deviate from the usual assump- 
tions and/or from the pattern suggested by the ma- 
jority of the data. Outliers are more likely to occur in 
datasets with many observations and/or variables, 
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and often they do not show up by simple visual in- 
spection. 

The usual multivariate analysis techniques (e.g., 
principal components, discriminant analysis and mul- 
tivariate regression) are based on empirical means, 
covariance and correlation matrices, and least squares 
fitting. All of these can be strongly affected by even 
a few outliers. When the data contain nasty outliers, 
typically two things happen: 

• the multivariate estimates differ substantially from 
the "right" answer, defined here as the estimates 
we would have obtained without the outliers; 

• the resulting fitted model does not allow to de- 
tect the outliers by means of their residuals, Ma- 
halanobis distances or the widely used "leave-one- 
out" diagnostics. 

The first consequence is fairly well known (although 
the size of the effect is often underestimated) . Unfor- 
tunately, the second consequence is less well known, 
and when stated many people find it hard to believe 
or paradoxical. Common intuition says that outliers 
must "stick out" from the classical fitted model, and 
indeed some of them may do so. But the most harm- 
ful types of outliers, especially if there are several of 
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them, may affect the estimated model so much "in 
their direction" that they are now well- fitted by it. 

Once this effect is understood, one sees that the 
following two problems are essentially equivalent: 

• Robust estimation: find a "robust" fit, which is 
similar to the fit we would have found without 
the outliers. 

• Outlier detection: find all the outliers that matter. 

Indeed, a solution to the first problem allows us to 
identify the outliers by their residuals, and so on, 
from the robust fit. Conversely, a solution to the 
second problem allows us to remove or downweight 
the outliers followed by a classical fit, which yields 
a robust result. 

Our research focuses on the first problem, and uses 
its results to answer the second. We prefer this ap- 
proach over the opposite direction because from a 
combinatorial viewpoint it is more feasible to search 
for sufficiently many "good" data points than to find 
all the "bad" data points. 

It turns out that most of the currently available 
highly robust multivariate estimators are difficult 
to compute, which makes them unsuitable for the 
analysis of large and/or high-dimensional datasets. 
Among the few exceptions is the minimum covari- 
ance determinant estimator (MOD) of Rousseeuw 
(1984, 1985). The MCD is a highly robust estima- 
tor of multivariate location and scatter, that can be 
computed efficiently with the FAST-MCD algorithm 
of Rousseeuw and Van Driessen (1999). 

Section 2 concentrates on robust estimation of lo- 
cation and scatter. We first describe the MCD es- 
timator and discuss its main properties. Alterna- 
tives for the MCD are explained briefiy with relevant 
pointers to the literature for more details. Section 3 
does the same for robust regression and mainly fo- 
cuses on the least trimmed squares (LTS) estimator 
(Rousseeuw, 1984), which is an analog of MCD for 
multiple regression. Since estimating the covariance 
matrix is the cornerstone of many multivariate sta- 
tistical methods, robust scatter estimators have also 
been used to develop robust and computationally ef- 
ficient multivariate techniques. The paper then goes 
on to describe robust methods for multivariate re- 
gression (Section 4), classification (Section 5), prin- 
cipal component analysis (Section 6), principal com- 
ponent regression (Section 7), partial least squares 
regression (Section 8) and other settings (Section 
9). Section 10 concludes with pointers to available 
software for the described techniques. 



2. MULTIVARIATE LOCATION AND 
SCATTER 

2.1 The Need for Robustness 

In the multivariate location and scatter setting 
we assume that the data are stored in an n x p data 
matrix X = (xi, . . . , x„)' with Xj = {xn, . . . , Xip)' the 
ith observation. Hence n stands for the number of 
objects and p for the number of variables. 

To illustrate the effect of outliers we consider the 
following engineering problem, taken from Rousseeuw 
and Van Driessen (1999). Philips Mecoma (The Nether- 
lands) produces diaphragm parts for television sets. 
These are thin metal plates, molded by a press. 
When starting a new production line, p = 9 charac- 
teristics were measured for n = 677 parts. The aim is 
to gain insight in the production process and to find 
out whether abnormalities have occurred. A classical 
approach is to compute the Mahalanobis distance 

(1) MD(x,) = V'(xi-AoySo'(x,-Ao) 

of each measurement Xj. Here fiQ is the arithmetic 
mean and Sq is the classical covariance matrix. The 
distance MD(xj) should tell us how far away Xj is 
from the center of the cloud, relative to the size of 
the cloud. 

In Figure 1 we plotted the classical Mahalanobis 
distance versus the index i, which corresponds to the 
production sequence. The horizontal line is at the 
usual cutoff value ^ X9,o.975 = 4.36. Figure 1 suggests 
that most observations are consistent with the clas- 
sical assumption that the data come from a multi- 
variate normal distribution, except for a few isolated 
outliers. This should not surprise us, even in the 
first experimental run of a new production line, be- 
cause the Mahalanobis distances are known to suffer 
from the masking effect. That is, even if there were a 
group of outliers (here, deformed diaphragm parts), 
they would affect /Ig and Sq in such a way that they 
get small Mahalanobis distances MD(xj) and thus 
become invisible in Figure 1. To get a reliable anal- 
ysis of these data we need robust estimators /I and 

that can resist possible outliers. For this purpose 
we will use the MCD estimates described below. 

2.2 Description of the IVICD 

The MCD method looks for the h observations 
(out of n) whose classical covariance matrix has the 
lowest possible determinant. The MCD estimate of 
location is then the average of these h points, whereas 
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Fig. 1. Mahalanobis distances of the Philips data. 



the MCD estimate of scatter is a multiple of their 
covariance matrix. The MCD location and scatter 
estimates are affine equivariant, which means that 
they behave properly under affine transformations 
of the data. That is, for an n x p dataset X the 
MCD estimates (/t, S) satisfy 

(2) /i(XA + l„v')=A(X)A + v, 

(3) S(XA + l„v') = A'I](X)A, 

for all p X 1 vectors v and all nonsingular pxp matri- 
ces A. The vector 1„ is (1,1,..., 1)' with n elements. 
Affine equivariance is a natural property of the un- 
derlying model and makes the analysis independent 
of the measurement scales of the variables as well as 
translations or rotations of the data. 

A useful measure of robustness is the finite-sample 
breakdown value (Donoho and Huber, 1983). The 
breakdown value e5^(/i,X) of an estimator at the 
dataset X is the smallest amount of contamination 
that can have an arbitrarily large effect on p,. Con- 
sider all possible contaminated datasets X obtained 
by replacing any m of the original observations by 
arbitrary points. Then the breakdown value of a lo- 
cation estimator fi is the smallest fraction m/n of 
outliers that can take the estimate over all bounds: 



(4) 



<(A,x) 



m 



min< — ;sup ||/i(X) — A(X)|| = oo 



n 



For many estimators e*(/i,X) varies only slightly 
with X and n, so that we can denote its limiting 
value (for n — > oo) by e*(/i). Similarly, the break- 
down value of a covariance matrix estimator S is 
defined as the smallest fraction of outliers that can 
take either the largest eigenvalue Ai(S) to infin- 
ity or the smallest eigenvalue Ap(S) to zero. The 
MCD estimates (/i, 'S) of multivariate location and 
scatter have breakdown value eJi(A) = £n(^) ~ (^^ — 
h)/n. The MCD has its highest possible breakdown 
value (e* = 50%) when h=[{n + p+l) /2] (see Lop- 
uhaa and Rousseeuw, 1991). Note that no affine 
equivariant estimator can have a breakdown value 
above 50%. For a recent discussion of the impor- 
tance of equivariance in breakdown considerations, 
see Davies and Gather (2005). 

An efficient algorithm to compute the MCD is the 
FAST-MCD algorithm explained in Appendix A.l. 
By default FAST-MCD computes a one-step weighted 
estimate given by 



(5) 



(6) 
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where 



1, ifd 



(Amcdi^mcd) 



0, otherwise, 



975' 



with /Imcd ^^'^ ^MCD the raw MCD estimates. The 
number d^^n in (6) is a correction factor (Pison, 
Van Aelst and Willems, 2002) to obtain unbiased 
and consistent estimates when the data come from 
a multivariate normal distribution. 

This one-step weighted estimator has the same 
breakdown value as the initial MCD estimator but a 
much better statistical efficiency. In practice we of- 
ten do not need the maximal breakdown value. For 
example, Hampel et al. (1986, pages 27-28) write 
that 10% of outliers is quite common. We typically 
use h = 0.75n so that e* = 25%, which is sufficiently 
robust for most applications and has a high sta- 
tistical efficiency. For example, with h = 0.75n the 
asymptotic efficiencies of the weighted MCD loca- 
tion and scatter estimators in 10 dimensions are 
94% and 88%, respectively (Croux and Haesbroeck, 
1999). 

2.3 Examples 

Let us now reanalyze the Philips data. For each 
observation Xj we now compute the robust distance 
(Rousseeuw and Leroy, 1987) given by 



(7) 



RD(xi) = V(xi-A)'5: (xi-A), 



where (/i, S) are the MCD location and scatter esti- 
mates. Recall that the Mahalanobis distances in Fig- 
ure 1 indicated no groups of outliers. On the other 
hand, the robust distances RD(xi) in Figure 2 show 
a strongly deviating group of outliers, ranging from 
index 491 to index 565. Something happened in the 
production process, which was not visible from the 
classical Mahalanobis distances due to the masking 
effect. Furthermore, Figure 2 also shows a remark- 
able change after the first 100 measurements. Both 
phenomena were investigated and interpreted by the 
engineers at Philips. 

The second dataset came from a group of Cal Tech 
astronomers working on the Digitized Palomar Sky 
Survey (see Odewahn et al., 1998). They made a 
survey of celestial objects (light sources) by record- 
ing nine characteristics (such as magnitude, area, 
image moments) in each of three bands: blue, red 
and near-infrared. The database contains measure- 
ments for 27 variables on 137,256 celestial objects. 
Based on exploratory data analysis Rousseeuw and 



Van Driessen (1999) selected six of the variables 
(two from each band). The classical Mahalanobis 
distances revealed a set of outliers which turned out 
to be objects for which at least one measurement fell 
outside its physically possible range. Therefore, the 
data was cleaned by removing all objects with phys- 
ically impossible measurements, leading to a cleaned 
dataset of size 132,402. The Mahalanobis distances 
of the cleaned data are shown in Figure 3(a). 

This plot (and a Q-Q plot) suggests that the dis- 
tances approximately come from the ^J'x^ distribu- 
tion, as would be the case if the data came from 
a homogeneous population. Figure 3(b) shows the 
robust distances computed with the FAST-MCD al- 
gorithm. In contrast to the innocent-looking Maha- 
lanobis distances, these robust distances reveal the 
presence of two groups. There is a majority with 
RD(xi) < 



'X6,o.975 ^ group with RD(xj) be- 
tween 8 and 16. Based on these results the astronomers 
noted that the lower group are mainly stars while 
the upper group are mainly galaxies. 

2.4 Other robust estimators of multivariate 
location and scatter 

The breakdown point is not the only important ro- 
bustness measure. Another key concept is the influ- 
ence function, which measures the effect on an esti- 
mator of adding a small mass at a specific point. (See 
Hampel et al., 1986 for details.) Robust estimators 
ideally have a bounded influence function, which 
means that a small contamination at any point can 
only have a small effect on the estimator. M-estimators 
(Maronna, 1976; Huber, 1981) were the first class of 
bounded infiuence estimators for multivariate loca- 
tion and scatter. Also the MCD and other estimators 
mentioned below have a bounded infiuence function. 
The first high-breakdown location and scatter esti- 
mator was proposed by Stahel (1981) and Donoho 
(1982). The Stahel-Donoho estimates are a weighted 
mean and covariance, like (5)-(6), where the weight 
Wi of an observation Xj depends on its outlyingness, 
given by 

|x-v — medj(x^v)| 



Ui 



sup 

l|v||=l 



madj(x^v) 



The estimator has good robustness properties but 
is computationally very intensive, which limits its 
use (Tyler, 1994; Maronna and Yohai, 1995). The 
Stahel-Donoho estimator measures the outlyingness 
by looking at all univariate projections of the data 
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Fig. 2. Robust distances of the Philips data. 
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Fig. 3. Cleaned digitized Palomar data: (a) Mahalanobis distances; (h) robust distances. 



and as such is related to projection pursuit methods 
as studied in Friedman and Tukey (1974), Huber 
(1985) and Croux and Ruiz-Gazen (2005). Another 
highly robust estimator of location and scatter based 
on projections has been proposed by Maronna, Sta- 
hel and Yohai (1992). 

Together with the MCD, Rousseeuw (1984, 1985) 
also introduced the minimum volume ellipsoid (MVE) 
estimator which looks for the minimal volume el- 



lipsoid covering at least half the data points. How- 
ever, the MVE has efficiency zero due to its low rate 
of convergence. Rigorous asymptotic results for the 
MCD and the MVE are given by Butler, Davies and 
Jhun (1993) and Davies (1992a). To improve the 
finite-sample efficiency of MVE and MCD a one- 
step weighted estimator (5)-(6) can be computed. 
The breakdown value and asymptotic properties of 
one-step weighted estimators have been obtained by 
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Fig. 4. Simple regression data with different types of outliers. 



Lopuhaa and Rousseeuw (1991) and Lopuhaa (1999). 
Alternatively, a one-step M-estimator starting from 
MVE or MCD can be computed as proposed by Davies 
(1992b). 

Another approach to improve the efficiency of MVE 
or MCD is to use a smoother objective function. An 
important class of robust estimators of multivariate 
location and scatter are S-estimators (Rousseeuw 
and Leroy, 1987; Davies, 1987), defined as the solu- 
tion (/i, S) which minimizes det(S) under the con- 
straint 



(8) 



1 " 



(x,-/2)'5]-i(xi 



<b 



over all vectors and all p x p positive definite sym- 
metric matrices S. Setting 6 = ^^^^[/odlX) ||] assures 
consistency at the model distribution F. The func- 
tion p is chosen by the statistician and is often taken 
to be Tukey's biweight p-function 



(9) p{x) 



X 

T 

„2 



6 



4 6 
X X 

2?^ 6?' 



if \x\ < c. 



if \x\ > c. 



The constant c determines the breakdown value which 
is given by e* = 6b/ c^. The properties of S-estimators 
have been investigated by Lopuhaa (1989). Related 
classes include CM-estimators (Kent and Tyler, 1996), 
MM-estimators (Tatsuoka and Tyler, 2000) and r- 
estimators Lopuhaa (1991). Positive-breakdown es- 
timators of location and scatter can also be used to 



construct formal outlier identification rules; see, for 
example, Becker and Gather (1999). 

To extend the notion of ranking to higher dimen- 
sions, Tukey introduced the half space depth. Depth- 
based estimators have been proposed and studied 
by Donoho and Gasko (1992), Rousseeuw, Ruts and 
Tukey (1999a), Liu, Parelius and Singh (1999), Zuo 
and Serfiing (2000a, 2000b) and Zuo, Cui and He 
(2004). 

Robust estimation and outlier detection in higher 
dimensions has been studied by Rocke (1996) and 
Rocke and Woodruff (1996). For very high-dimensional 
data, Maronna and Zamar (2002) and Alqallaf et 
al. (2002) proposed computationally efficient robust 
estimators of multivariate location and covariance 
that are not affine equivariant any more. Chen and 
Victoria-Feser (2002) address robust covariance ma- 
trix estimation with missing data. 

3. MULTIPLE REGRESSION 

3.1 Motivation 

The multiple regression model assumes that also 
a response variable y is measured, which can be ex- 
plained as an affine combination of the x- variables. 
More precisely, the model says that for all observa- 
tions (xj, yj) with i = 1, . . . ,n it holds that 



(10) 



hXil + • • • + OpXip + Op^i + Ei 



i = l,...,n, 
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where the errors are assumed to be i.i.d. with 
zero mean and constant variance cr^. The vector 
P = {9i, . . . , 9p)' is called the slope, and a = 0p+i 
the intercept. Denote Xj = (xji, . . . , Xjp)' and 9 = 

if3',ay = {ei,...,ep,ep+iy. 

The classical least squares method to estimate 6 
and a is extremely sensitive to regression outliers, 
that is, observations that do not obey the linear 
pattern formed by the majority of the data. In re- 
gression we can distinguish between different types 
of points. This is illustrated in Figure 4 for simple 
regression. Leverage points are observations {xi,yi) 
whose Xj are outlying; that is, Xj deviates from the 
majority in x-space. We call such an observation 
(xj,yj) a good leverage point if (xj,yj) follows the 
linear pattern of the majority, such as points 2 and 
21. If, on the other hand, (xj,yj) does not follow this 
linear pattern, we call it a bad leverage point, like 
4, 7 and 12. An observation whose x, belongs to the 
majority in x-space but where (xj,yj) deviates from 
the linear pattern is called a vertical outlier, like the 
points 6, 13 and 17. A regression dataset can thus 
have up to four types of points: regular observations, 
vertical outliers, good leverage points and bad lever- 
age points. Leverage points attract the least squares 
solution toward them, so bad leverage points are of- 
ten not apparent in a classical regression analysis. 

In low dimensions, as in this example, visual in- 
spection can be used to detect outliers and leverage 
points, but in higher dimensions this is not an op- 
tion anymore. Therefore, we need robust and com- 
putationally efficient estimators that yield a reli- 
able analysis of regression data. We consider the 
least trimmed squares estimator (LTS) proposed by 
Rousseeuw (1984) for this purpose. 

For a dataset Z = {(xj,yj); i = 1, . . . ,n} and for 
any 6 denote the corresponding residuals by rj = 
ri{0) = yi- P'xi - a = yi- 9'ui with Uj = (x-, 1)'. 
Then the LTS estimator is defined as the 9 which 
minimizes 



ill) 



i=l 



< {l"'^)n:n SLie the or- 



where {r^)i:n < ir'^)2:n < 
dered squared residuals (note that the residuals are 
first squared and then ordered). This is equivalent 
to finding the ^-subset with smallest least squares 
objective function, which resembles the definition 
of the MCD. The LTS estimate is then the least 
squares fit to these h points. The LTS estimates are 



regression, scale and affine equivariant. That is, for 
any X = (xi, . . . , x.„)' and y = (yi, . . . , y„)' it holds 
that 



(12) 



0(X, y + Xv + Inc) = 0(X, y) + (V, c)' 
0(X,cy)=c0(X,y), 

^(XA' + ln^r', y) = (^'(X, y) A^^ , a(X, y) 

-$\x,y)A-\y, 



for any vector v, any constant c and any nonsingular 
p X p matrix A. 

The breakdown value of a regression estimator 
at a dataset Z is the smallest fraction of outliers that 
can have an arbitrarily large effect on 6. Formally, it 
is defined by (4) where X is replaced by (X,y). For 
h = [{n + p+ l)/2] the LTS breakdown value equals 
e*(LTS) ~ 50%, whereas for larger h we have that 
e*(LTS) K. {n — h) /n. The usual choice « 0.75n 
yields e* (LTS) = 25%. 

When using LTS regression, the standard devia- 
tion of the errors can be estimated by 



(13) 



i=l 



where rj are the residuals from the LTS fit, and Ch,n 
makes a consistent and unbiased at Gaussian error 
distributions (Pison, Van Aelst and Willems, 2002). 
Note that the LTS scale estimator a is itself highly 
robust. Therefore, we can identify regression outliers 
by their standardized LTS residuals rj/a. 

To compute the LTS in an efficient way, Rousseeuw 
and Van Driessen (2006) developed the FAST-LTS 
algorithm outlined in Appendix A. 2. Similarly to the 
FAST-MCD algorithm, FAST-LTS returns weighted 
least squares estimates, given by 



(14) 



(15) 



01 



E 

\i=l 



E 

\i=l 



WiUiyi 



\ 



where Uj = (x^, 1)'. The weights are 



Wi 



1, if |ri(0LTs)/o-LTs| < 
0, otherwise. 



Xl, 0.975' 



where ^lts a-nd '5"lts are the raw LTS estimates. 
As before, d/j „ is a finite-sample correction factor. 
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These weighted estimates have the same breakdown 
value as the initial LTS estimates and a much better 
statistical efficiency. Moreover, from the weighted 
least squares estimates all the usual inferential out- 
put such as t-statistics, -F-statistics an statis- 
tic and the corresponding p-values can be obtained 
(Rousseeuw and Leroy, 1987). These p- values are 
approximate since they assume that the data with 
Wi = 1 come from the model (10) whereas the data 
with Wi = do not, and we usually do not know 
whether that is true. 

In Figure 4 we see that the LTS line obtained by 
FAST-LTS yields a robust fit that is not attracted 
by the leverage points on the right-hand side, and 
hence follows the pattern of the majority of the data. 
Of course, the LTS method is most useful when there 
are several x-variables. 

To detect leverage points in higher dimensions we 
must detect outlying Xj in x-space. For this pur- 
pose we will use the robust distances RDj based 
on the one-step weighted MOD of the previous sec- 
tion. On the other hand, we can see whether a point 
(xj,yi) lies near the majority pattern by looking at 
its standardized LTS residual ri/a. Rousseeuw and 
van Zomeren (1990) proposed an outlier map which 
plots robust residuals ri/a versus robust distances 
RD(xj), and indicates the corresponding cutoffs by 
horizontal and vertical lines. It automatically clas- 
sifies the observations into the four types of data 
points that can occur in a regression dataset. Fig- 
ure 5 is the outlier map of the data in Figure 4. 

To illustrate this plot, we again consider the data- 
base of the Digitized Palomar Sky Survey. Follow- 
ing Rousseeuw and Van Driessen (2006), we now use 
the subset of 56,744 stars (not galaxies) for which 
all the characteristics in the blue color (the F band) 
are available. The response variable MaperF is re- 
gressed against the other eight characteristics of the 
color band F. These characteristics describe the size 
of a light source and the shape of the spatial bright- 
ness distribution in a source. Figure 6(a) plots the 
standardized LS residuals versus the classical Ma- 
halanobis distances. Some isolated outliers in the 
y-direction as well as in x-space were not plotted to 
get a better view of the majority of the data. Ob- 
servations for which the standardized absolute LS 
residual exceeds the cutoff ^Xi,o.975 considered 
to be regression outliers, whereas the other observa- 
tions are thought to obey the linear model. Similarly, 
observations for which MD(xj) exceeds the cutoff 



975 a-re considered to be leverage points. Fig- 
ure 6(a) shows that most data points lie between the 
horizontal cutoffs at ± y/xi,o.975 which suggests that 
most data follow the same linear trend. On the other 
hand, the outlier map based on LTS residuals and 
robust distances RD(xj) shown in Figure 6(b) tells a 
different story. This plot reveals a rather large group 
of observations with large robust residuals and large 
robust distances. Hence, these observations are bad 
leverage points. This group turned out to be giant 
stars, which are known to behave differently from 
other stars. 

3.2 Other robust regression methods 

The development of robust regression often paral- 
leled that of robust estimators of multivariate loca- 
tion and scatter, and in fact more attention has been 
dedicated to the regression setting. Robust regres- 
sion also started with M-estimators (Huber, 1973, 
1981), later followed by R-estimators (Jureckova, 
1971) and L-estimators (Koenker and Portnoy, 1987) 
that all have breakdown value zero because of their 
vulnerability to bad leverage points. 

The next step was the development of generalized 
M-estimators (GM-estimators) that bound the in- 
fluence of outlying Xj by giving them a small weight 
(see, e.g., Krasker and Welsch, 1982; Maronna and 
Yohai, 1981). Therefore, GM-estimators are often 
called hounded influence methods, and they are more 
stable than M-, L- or R-estimators. See Hampel et 
al. (1986, Chapter 6) for an overview. Unfortunately, 
the breakdown value of GM-estimators with a mono- 
tone score function still goes down to zero for in- 
creasing p (Maronna, Burtos and Yohai, 1979). GM- 
estimators with a redescending score function can 
have a dimension-independent positive breakdown 
value (see He, Simpson and Wang, 2000). Note that 
for a small fraction of outliers in the data GM- 
estimators are robust, and they are computation- 
ally fast. For a discussion of the differences between 
bounded-infiuence estimators and high-breakdown 
methods see the recent book by Maronna, Martin 
and Yohai (2006). 

The first high-breakdown regression methods were 
least median of squares (LMS), LTS and the re- 
peated median. The origins of LMS go back to Tukey 
(Andrews et al., 1972), who proposed a univariate 
estimator based on the shortest half of the sample 
and called it the shorth. Hampel (1975, page 380) 
modified and generalized it to regression and stated 



ROBUST MULTIVARIATE STATISTICS 



9 



.6 

vartK«l (HitHen 
t3 



m 



i2 

bad ilBVflrag« ooivt * 



21 




1 11 4 S ■ 

Fig. 5. Regression outlier map of the data in Figure 4- 



S3 





h 




i 


i 

■- 




■ 














' • • " 


■ + 






ii " 









(a) 




ID Z9 30 5a 60 



9 to 2f] 30 SO ea 



(b) 



Fig. 6. Digitized Palomar Sky Survey data: regression of MaperF on eight regressors. (a.) Plot of LS residual versus Maha- 
lanobis distance MD{xi); (h) outlier map of LTS residual versus robust distance RD{jii). 



that the resulting estimator has a 50% breakdown 
value. He called it the shordth and considered it 
of special mathematical interest. Later, Rousseeuw 
(1984) provided theory, algorithms and programs 
for this estimator, as well as applications (see also 
Rousseeuw and Leroy, 1987). However, LMS has an 
abnormally slow convergence rate and hence its 
asymptotic efficiency is zero. In contrast, LTS is 
asymptotically normal and can be computed much 
faster. The other high-breakdown regression method 



was Siegel's repeated median technique (Siegel, 1982), 
which has good properties in the simple regression 
case (p = 2) but is no longer affine equivariant in 
multiple regression (p > 3). 

As for multiple location and scatter, the efficiency 
of a high-breakdown regression estimator can be im- 
proved by computing one-step weighted least squares 
estimates (14)-(15) or by computing a one-step M- 
estimator as done in Rousseeuw (1984). In order to 
combine these advantages with those of the bounded 
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influence approach, it was later proposed by Simp- 
son, Ruppert and Carroll (1992), Coakley and 
Hettmansperger (1993) and Simpson and Yohai (1998) 
to compute a one-step GM-estimator starting from 
LTS. 

Tests and variable selection for robust regression 
were developed by Markatou and He (1994), Marka- 
tou and Hettmansperger (1990), Ronchetti and 
Staudte (1994) and Ronchetti, Field and Blanchard 
(1997). For high-breakdown methods, variable se- 
lection by all subsets regression becomes infeasible. 
One way out is to apply the robust method to all 
variables, yielding weights, and then to apply the 
classical selection methods for weighted least squares. 
Alternatively, a robust measure (Croux and De- 
hon, 2003) or a robust penalized selection criterion 
(Miiller and Welsh, 2006) can be used in a forward 
or backward selection strategy. 

Another approach to improve the efficiency of the 
LTS is to replace its objective function by a smoother 
alternative. Similarly as in (8), S-estimators of re- 
gression (Rousseeuw and Yohai, 1984) are defined 
as the solution {6, a) that minimizes a subject to 
the constraint 

The constant b usually equals E^[p{Y)] to assure 
consistency at the model with normal error distri- 
bution, and as before p is often taken to be Tukey's 
biweight p function (9). Salibian-Barrera and Yohai 
(2006) recently constructed an efficient algorithm to 
compute regression S-estimators. Related classes of 
efficient high-breakdown estimators include 
MM-estimators (Yohai, 1987), r-estimators (Yohai 
and Zamar, 1988), a new type of R-estimators 
(Hossjer, 1994), generalized S-estimators (Croux, 
Rousseeuw and Hossjer, 1994), CM-estimators 
(Mendes and Tyler, 1996) and generalized r-esti- 
mators (Ferretti et al., 1999). Inference for these es- 
timators is usually based on their asymptotic distri- 
bution at the central model. Alternatively, for MM- 
estimators Salibian-Barrera and Zamar (2002) de- 
veloped a fast and robust bootstrap procedure that 
yields reliable nonparametric robust inference. 

To extend the good properties of the univariate 
median to regression, Rousseeuw and Hubert (1999) 
introduced the notions of regression depth and deep- 
est regression. The deepest regression estimator has 
been studied by Rousseeuw, Van Aelst and Hubert 



(1999b), Van Aelst and Rousseeuw (2000), Van Aelst 
et al. (2002) and Bai and He (2000). 

Another important robustness measure, besides 
the breakdown value and the influence function, is 
the maxbias curve. The maxbias is the maximum 
possible bias of an estimator caused by a fixed frac- 
tion £ of contamination. The maxbias curve plots 
the maxbias of an estimator as a function of the 
fraction e = m/n of contamination. Maxbias curves 
of robust regression estimators have been studied 
in Martin, Yohai and Zamar (1989), He and Simp- 
son (1993), Croux, Rousseeuw and Hossjer (1994), 
Adrover and Zamar (2004) and Berrendero and Za- 
mar (2001). Projection estimators for regression 
(Maronna and Yohai, 1993) combine a low maxbias 
with high breakdown value and bounded influence 
but they are difficult to compute. 

Unbalanced binary regressors that contain, for ex- 
ample, 90% of zeroes and 10% of ones might be ig- 
nored by standard robust regression methods. Ro- 
bust methods for regression models that include cat- 
egorical or binary regressors have been developed 
by Hubert and Rousseeuw (1996) and Maronna and 
Yohai (2000). Robust estimators for orthogonal re- 
gression and error-in- variables models have been con- 
sidered by Zamar (1989, 1992) and Maronna (2005). 

4. MULTIVARIATE REGRESSION 

The regression model can be extended to the case 
where we have more than one response variable. For 
p-variate predictors Xj = (xn, . . . , Xip)' and g-variate 
responses yj = {yn, . . . , Uig)' the multivariate regres- 
sion model is given by 

(17) yi = B'x, + a + ei, 

where B is the px q slope matrix, a is the g-dimen- 
sional intercept vector, and the errors Si = {en, . . . , Eig)' 
are i.i.d. with zero mean and with Cov(e) = 'S^ a 
positive definite matrix of size q. Note that for q = l 
we obtain the multiple regression model of the pre- 
vious section. On the other hand, putting p=l and 
Xi = 1 yields the multivariate location and scatter 
model of Section 2. It is well known that the least 
squares solution can be written as 

(18) a = p,y-B'fi^, 

= 'Syy — 13 'SxxB, 
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and S 



^xx ^xy 
"^yx "^yy 



where Uj = (x^,l)' and di is a consistency factor. 
The weights Wi are given by 



are the empirical mean and covariance matrix of the 
joint (x,y) variables. 

Vertical outliers and bad leverage points highly 
influence the least squares estimates in multivari- 
ate regression, and may make the results completely 
unreliable. Therefore, robust alternatives have been 
developed. 

Rousseeuw et al. (2004) proposed to use the MCD 
estimates for the center /2 and scatter matrix XI 
in (18). The resulting estimates are called MCD re- 
gression estimates. It has been shown that the MCD 
regression estimates are regression, y-affine and x- 
affine equivariant. With X = (xi, . . . , x„)', Y = 
(yi; ■ • ■ ) Yn)' and d = {3 , &)' this means that 

0(X,Y + XD + l„w') 

= 0(X,Y) + (D',w)', 
0(X,YC + l„w') 
(19) =0(X,Y)C + (O;„w)', 

0(XA' + l„v',Y) 

= (B'(X,Y)A-i,a(X,Y) 
-B'(X,Y)A-1v)', 

where D is any p q matrix, A is any nonsingu- 
lar p X p matrix, C is any nonsingular q x q ma- 
trix, V is any p-dimensional vector and w is any 
g-dimensional vector. Here Opg is the p x q matrix 
consisting of zeroes. 

MCD regression inherits the breakdown value of 
the MCD estimator, thus eJi(^) ~ {n — h)/n. To ob- 
tain a better efficiency, the one-step weighted MCD 
estimates are used in (18) and followed by the re- 
gression weighting step described below. For any fit 
9 denote the corresponding Q'-dimensional residuals 
by ri{6) = yt — B Xi — a. Then the weighted regres- 
sion estimates are given by 



(20) 01= j^^u^iUiU^j l^WiUiy^j, 

(21) tl = dl(J2w^] (j2wMei)riieiy 



if (i(ri(0MCD)) < ^JXq,0.975^ 

otherwise. 



Wi 



with (i(ri(0MCD)) = V'^*(^MCD)'(Se)-iri(%cD) the 
robust distances of the residuals, corresponding to 
the initial MCD regression estimates ^mcd a-nd S^. 
Note that these weighted regression estimates (20)- 
(21) have the same breakdown value as the initial 
MCD regression estimates. 

To illustrate MCD regression we analyze a dataset 
from Shell's polymer laboratory, described in Roussee- 
uw et al. (2004). The dataset consists of n = 217 ob- 
servations with p = A predictor variables and q = ?> 
response variables. The predictor variables describe 
the chemical characteristics of a piece of foam, whereas 
the response variables measure its physical proper- 
ties such as tensile strength. The physical properties 
of the foam are determined by the chemical compo- 
sition used in the production process. Multivariate 
regression is used to establish a relationship between 
the chemical inputs and the resulting physical prop- 
erties of the foam. After an initial exploratory study 
of the variables, a robust multivariate MCD regres- 
sion was used. 

To detect leverage points and outliers the outlier 
map of Rousseeuw and van Zomeren (1990) has been 
extended to multivariate regression. In multivari- 
ate regression the robust distances of the residuals 
Yi{6i) are plotted versus the robust distances of the 
Xi . Figure 7 is the outlier map of the Shell foam data. 
Observations 215 and 110 lie far from both the hori- 
zontal cutoff line at y^xi.o 975 = 3-06 and the vertical 

cutoff line at X4,o.975 — 3-34. These two observa- 
tions can thus be classified as bad leverage points. 
Several observations lie substantially above the hor- 
izontal cutoff but not to the right of the vertical 
cutoff, which means that they are vertical outliers 
(their residuals are outlying but their x-values are 
not). 

Based on this list of special points the scientists 
who had made the measurements found out that a 
fraction of the observations in Figure 7 were made 
with a different production technique and hence be- 
long to a different population with other character- 
istics. These include the observations 210, 212 and 
215. We therefore remove these observations from 
the data, and retain only observations from the in- 
tended population. 
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Fig. 7. Regression outlier map of the foam data. 
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Regression outlier map of the corrected foam data. 



Running the method again yields the outUer map 
in Figure 8. Observation 110 is still a bad leverage 
point, and also several of the vertical outliers re- 
main. No chemical/physical mechanism was found 
to explain why these points are outliers, leaving open 



the possibility of some large measurement errors. 
But the detection of these outliers at least provides 
us with the option to choose whether or not to allow 
them to affect the final result. 
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Since MCD regression is mainly intended for re- 
gression data with random carriers, AguUo, Croux 
and Van Aelst (2006) developed an alternative ro- 
bust multivariate regression method which can be 
seen as an extension of LTS to the multivariate set- 
ting. This multivariate least trimmed squares es- 
timator (MLTS) can also be used in cases where 
the carriers are fixed. The MLTS looks for a subset 
of size h such that the determinant of the covari- 
ance matrix of its residuals corresponding to its least 
squares fit is minimal. Similarly as for MCD regres- 
sion, the MLTS has breakdown value e5^(0MLTs) ~ 
(n — h)/n and the equivariance properties (19) are 
satisfied. The MLTS can be computed quickly with 
an algorithm similar to that in Appendix A.l. To 
improve the efficiency while keeping the breakdown 
value, a one-step weighted MLTS estimator can be 
computed using expressions (20)-(21). Alternatively, 
Van Aelst and Willems (2005) introduced multivari- 
ate regression S-estimators and extended the fast 
robust bootstrap methodology of Salibian-Barrera 
and Zamar (2002) to this setting while Garcia Ben, 
Martinez and Yohai (2006) proposed r-estimators 
for multivariate regression. 

5. CLASSIFICATION 

The goal of classification, also known as discrim- 
inant analysis or supervised learning, is to obtain 
rules that describe the separation between known 
groups of observations. Moreover, it allows to clas- 
sify new observations into one of the groups. We de- 
note the number of groups by / and assume that we 
can describe our experiment in each population iTj 
by a p-dimensional random variable Xj with density 
function fj. We write pj for the membership proba- 
bility, that is, the probability for an observation to 
come from ttj. The maximum likelihood rule then 
classifies an observation x into tt^ if ln(pfc/fc(x)) is 
the maximum of the set {ln{pjfj{x));j = !,...,?}. 
If we assume that the density fj for each group is 
Gaussian with mean fij and covariance matrix 'Sj, 
then it can be seen that the maximum likelihood rule 
is equivalent to maximizing the discriminant scores 
df{x) with 

dy(x) = -iln|I]j| 

(22) -i(x-/x^.)'5^7Hx-/^,) 
+ Hpj). 

That is, x is allocated to vTfc if d^(x) > (i^(x) for all 
j = 1,. . . ,1 (see, e.g., Johnson and Wichern, 1998). 



In practice Hj, Sj and pj have to be estimated. 
Classical quadratic discriminant analysis (CQDA) 
uses the group's mean and empirical covariance ma- 
trix to estimate /x^- and Sj. The membership prob- 
abilities are usually estimated by the relative fre- 
quencies of the observations in each group, hence 
Pj = rij/n with rij the number of observations in 
group j. 

A robust quadratic discriminant analysis (RQDA) 
is derived by using robust estimators of /Xj, Sj and 
Pj. In particular, we can apply the weighted MCD 
estimator of location and scatter in each group. As a 
byproduct of this robust procedure, outliers (within 
each group) can be distinguished from the regular 
observations. Finally, the membership probabilities 
can be robustly estimated as the relative frequency 
of regular observations in each group. For an out- 
line of this approach, see Hubert and Van Driessen 
(2004). 

When the groups are assumed to have a common 
covariance matrix S, the quadratic scores (22) can 
be simplified to 

(23) 4 W = /^jS^^x - + HPj) 

since the terms — iln|S| and — ^x'5]~^x do not de- 
pend on j. The resulting scores (23) are linear in x, 
hence the maximum likelihood rule belongs to the 
class of linear discriminant analysis. It is well known 
that if we have only two populations {I = 2) with 
a common covariance structure and if both groups 
have equal membership probabilities, this rule coin- 
cides with Fisher's linear discriminant rule. Robust 
linear discriminant analysis based on the MCD esti- 
mator or S-estimators has been studied in Hawkins 
and McLachlan (1997), He and Fung (2000), Croux 
and Dehon (2001) and Hubert and Van Driessen 
(2004). The latter paper computes fij and by 
weighted MCD and then defines the pooled covari- 
ance matrix Yl = (X]j=i ?^jSj)/n. 

We consider a dataset that contains the spectra of 
three different cultivars of the same fruit (cantaloupe — 
Cucumis melo L. Cantaloupensis). The cultivars 
(named D, M and HA) have sizes 490, 106 and 500, 
and all spectra were measured in 256 wavelengths. 
The dataset thus contains 1096 observations and 256 
variables. First, a robust principal component anal- 
ysis (as described in the next section) was applied to 
reduce the dimension of the data space, and the first 
two components were retained. For a more detailed 
description and analysis of these data, see Hubert 
and Van Driessen (2004). 
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Fig. 9. (bu) Classical tolerance ellipses for the fruit data with common covariance matrix; (h) robust tolerance ellipses. 
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The data were divided randomly in a training set 
and a validation set, containing 60% and 40% of the 
observations. Figure 9 shows the training data. In 
this figure cultivar D is marked with crosses, culti- 
var M with circles and cultivar HA with diamonds. 
We see that cultivar HA has a cluster of outliers 
that are far away from the other observations. As it 
turns out, these outliers were caused by a change in 
the illumination system. To classify the data, we will 
use model (23) with a common covariance matrix 
Figure 9(a) shows the classical tolerance ellipses for 

the groups, given by (x - /i^ )' E (x - fi^ ) = X2,o.975 • 
Note how strongly the classical covariance estima- 
tor of the common S is influenced by the outlying 
subgroup of cultivar HA. On the other hand, Fig- 
ure 9(b) shows the same data with the correspond- 
ing robust tolerance ellipses. 

The effect on the resulting classical linear discrim- 
inant rules is dramatic for cultivar M. It appears 
that all the observations are badly classified because 
they would have to belong to a region that lies com- 
pletely outside the boundary of this figure! The ro- 
bust discriminant analysis does a better job. The 
tolerance ellipses are not affected by the outliers and 
the resulting discriminant lines split up the different 
groups more accurately. The misclassification rates 
are 17% for cultivar D, 95% for cultivar M and 6% 
for cultivar HA. The misclassification rate of culti- 
var M remains very high. This is due to the intrin- 
sic overlap between the three groups, and the fact 
that cultivar M has few data points compared to the 
others. (When we impose that all three groups are 
equally important by setting the membership proba- 
bilities equal to 1/3, we obtain a better classification 
of cultivar M with 46% of errors.) 

This example thus clearly shows that outliers can 
have a huge effect on the classical discriminant rules, 
whereas the robust version fares better. 

6. PRINCIPAL COMPONENT ANALYSIS 

6.1 Classical PCA 

Principal component analysis is a popular statis- 
tical method which tries to explain the covariance 
structure of data by means of a small number of 
components. These components are linear combina- 
tions of the original variables, and often allow for 
an interpretation and a better understanding of the 
different sources of variation. Because PCA is con- 
cerned with data reduction, it is widely used for 



the analysis of high-dimensional data which are fre- 
quently encountered in chemometrics, computer vi- 
sion, engineering, genetics, and other domains. PCA 
is then often the first step of the data analysis, fol- 
lowed by discriminant analysis, cluster analysis, or 
other multivariate techniques (see, e.g., Hubert and 
Engelen, 2004). It is thus important to find those 
components that contain most of the information. 

In the classical approach, the first component cor- 
responds to the direction in which the projected 
observations have the largest variance. The second 
component is then orthogonal to the first and again 
maximizes the variance of the projected data points. 
Continuing in this way produces all the principal 
components, which correspond to the eigenvectors 
of the empirical covariance matrix. Unfortunately, 
both the classical variance (which is being maxi- 
mized) and the classical covariance matrix (which is 
being decomposed) are very sensitive to anomalous 
observations. Consequently, the first components are 
often pulled toward outlying points, and may not 
capture the variation of the regular observations. 
Therefore, data reduction based on classical PCA 
(CPCA) becomes unreliable if outliers are present 
in the data. 

To illustrate this, let us consider a small artificial 
dataset in p = 4 dimensions. The Hawkins-Bradu- 
Kass dataset (see, e.g., Rousseeuw and Leroy, 1987) 
consists of n = 75 observations in which two groups 
of outliers were created, labeled 1-10 and 11-14. The 
first two eigenvalues explain already 98% of the to- 
tal variation, so we select k = 2. The CPCA scores 
plot is depicted in Figure 10(a). In this figure we 
can clearly distinguish the two groups of outliers, 
but we see several other undesirable effects. We first 
observe that, although the scores have zero mean, 
the regular data points lie far from zero. This stems 
from the fact that the mean of the data points is a 
bad estimate of the true center of the data in the 
presence of outliers. It is clearly shifted toward the 
outlying group, and consequently the origin even 
falls outside the cloud of the regular data points. 
On the plot we have also superimposed the 97.5% 
tolerance ellipse. We see that the outliers 1-10 are 
within the tolerance ellipse, and thus do not stand 
out based on their Mahalanobis distance. The ellipse 
has stretched itself to accommodate these outliers. 

6.2 Robust PCA 

The goal of robust PCA methods is to obtain 
principal components that are not influenced much 
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Fig. 10. Score plot and 97.5% tolerance ellipse of the Hawkins-Bradu-Kass data obtained with (a.) C'PCA; (h) MCD. 



by outliers. A first group of methods is obtained 
by replacing the classical covariance matrix by a 
robust covariance estimator. Maronna (1976) and 
Campbell (1980) proposed using affine equivariant 
M-estimators of scatter for this purpose, but these 
cannot resist many outliers. Croux and Haesbroeck 
(2000) used positive-breakdown estimators of scat- 
ter such as the MCD and S-estimators. Recently, 
Salibian-Barrera, Van Aelst and Willems (2006) pro- 
posed using S- or MM-estimators of scatter and de- 
veloped a fast robust bootstrap procedure for infer- 
ence and to assess the stability of the PCA solution. 
Let us reconsider the Hawkins-Bradu-Kass data in 
p = A dimensions. Robust PCA using the weighted 
MCD estimator yields the score plot in Figure 10(b). 
We now see that the center is correctly estimated in 
the middle of the regular observations. The 97.5% 
tolerance ellipse nicely encloses these points and ex- 
cludes all 14 outliers. 

Unfortunately, the use of these affine equivariant 
covariance estimators is limited to small to moder- 
ate dimensions. To see why, consider, for example, 
the MCD estimator. If p denotes the number of vari- 
ables in our dataset, the MCD estimator can only be 
computed if p < h] otherwise the covariance matrix 
of any /i-subset has zero determinant. Since h <n, p 
can never be larger than n. A second problem is the 
computation of these robust estimators in high di- 
mensions. Today's fastest algorithms (Woodruff and 
Rocke, 1994; Rousseeuw and Van Driessen, 1999) 



can handle up to about 100 dimensions, whereas 
there are fields like chemometrics, which need to an- 
alyze data with dimensions in the thousands. 

A second approach to robust PCA uses projection 
pursuit (PP) techniques. These methods maximize 
a robust measure of spread to obtain consecutive di- 
rections on which the data points are projected. In 
Hubert, Rousseeuw and Verboven (2002) a PP algo- 
rithm is presented, based on the ideas of Li and Chen 
(1985) and Croux and Ruiz-Gazen (1996, 2005). It 
has been successfully applied in several studies, for 
example, to detect outliers in large microarray data 
(Model et al., 2002). Asymptotic results about this 
approach are presented in Cui, He and Ng (2003). 

Hubert, Rousseeuw and Vanden Branden (2005) 
proposed a robust PCA method, called ROBPCA, 
which combines ideas of both projection pursuit and 
robust covariance estimation. The PP part is used 
for the initial dimension reduction. Some ideas based 
on the MCD estimator are then applied to this lower- 
dimensional data space. Simulations in Hubert, 
Rousseeuw and Vanden Branden (2005) have shown 
that this combined approach yields more accurate 
estimates than the raw PP algorithm. An outline of 
the ROBPCA algorithm is given in Appendix A. 3. 

The ROBPCA method applied to a dataset X„^p 
yields robust principal components which can be 
collected in a loading matrix Pj, ^ with orthogonal 
columns, and a robust center /i^.. From here on the 
subscripts to a matrix serve to recall its size; fore 
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Fig. 11. (a.) Different types of outliers when a three-dimensional dataset is projected on a robust two-dimensional PCA-sub- 
space; (h) the corresponding PC A outlier map. 



example, X„^p is an n x p matrix and Pp,fc is p x A;. 
(Note that it is possible to robustly scale the vari- 
ables first by dividing them by a robust scale esti- 
mate; see, e.g., Rousseeuw and Croux, 1993.) The 
robust scores are the A; x 1 column vectors 

tj = (Pp,fc)'(Xi - (i^). 

The orthogonal distance measures the distance be- 
tween an observation and its projection in the k- 
dimensional PCA subspace: 



(24) 



OD,: 



X,; 



Pp,fcti I 



Let L denote the diagonal matrix which contains the 
eigenvalues Ij of the MCD scatter matrix, sorted 
from largest to smallest. The score distance of Xj 
with respect to /ij^jP and L is then defined as 



SDi = V (xi - Ax)'Pp,fcLfci(Pp,fc)'(xi - (i^) 



\ 



All the above mentioned methods are translation 
and orthogonal equivariant, that is, (2)-(3) hold for 
any vector v and any pxp matrix A with A A' = I. 
To be precise, let jl^ and P denote the robust cen- 
ter and loading matrix of the original observations 
Xj . Then the robust center and loadings of the trans- 
formed data Axj + v are equal to A/x^ + v and AP. 
The scores (and distances) remain the same after 
this transformation, since 

ti(Axi + v) = P'A'(Axi + V - {Afi.^ + v)) 

= P'(xi-/i^)=ti(xi). 

We also mention the robust LTS-subspace esti- 
mator and its generalizations, introduced and dis- 
cussed in Rousseeuw and Leroy (1987) and Maronna 
(2005). The idea behind these approaches consists 
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in minimizing a robust scale of the orthogonal dis- 
tances, similar to the LTS estimator and S-estimators 
in regression. For functional data, a fast PCA method 
is introduced in Locantore et al. (1999). 

6.3 Outlier Map 

The result of the PCA analysis can be represented 
by means of the outlier map given in Hubert, 
Rousseeuw and Vanden Branden (2005). As in re- 
gression, this figure highlights the outliers and clas- 
sifies them into several types. In general, an out- 
lier is an observation which does not obey the pat- 
tern of the majority of the data. In the context of 
PCA, this means that an outlier either lies far from 
the subspace spanned by the k eigenvectors, and/or 
that the projected observation lies far from the bulk 
of the data within this subspace. This can be ex- 
pressed by means of the orthogonal and the score 
distances. These two distances define four types of 
observations, as illustrated in Figure 11(a). Regular 
observations have a small orthogonal and a small 
score distance. Bad leverage points, such as observa- 
tions 2 and 3, have a large orthogonal distance and 
a large score distance. They typically have a large 
influence on classical PCA, as the eigenvectors will 
be tilted toward them. When points have a large 
score distance but a small orthogonal distance, we 
call them good leverage points. Observations 1 and 
4 in Figure 7(a) can be classified into this category. 
Finally, orthogonal outliers have a large orthogonal 
distance, but a small score distance, as, for example, 
case 5. They cannot be distinguished from the reg- 
ular observations once they are projected onto the 
PCA subspace, but they lie far from this subspace. 

The outlier map in Figure 11(b) displays the ODj 
versus the SDj. In this plot, lines are drawn to distin- 
guish the observations with a small and a large OD, 
and with a small and a large SD. For the latter dis- 
tances, the cutoff value c = ^Jxtfi 975 used. For the 
orthogonal distances, the approach of Box (1954) is 
followed. The squared orthogonal distances can be 
approximated by a scaled distribution which in 
its turn can be approximated by a normal distribu- 
tion using the Wilson-Hilferty transformation. The 
mean and variance of this normal distribution are 
then estimated by applying the univariate MCD to 
the ODp. 

6.4 Example 

We illustrate the PCA outlier map on a dataset 
consisting of spectra of 180 archaeological glass pieces 



over p = 750 wavelengths (Lemberge et al., 2000). 
The measurements were performed using a Jeol JSM 
6300 scanning electron microscope equipped with 
an energy-dispersive Si(Li) X-ray detection system. 
Three principal components were retained for CPCA 
and ROBPCA, yielding the outlier maps in Fig- 
ure 12. In Figure 12(a) we see that CPCA does not 
find big outliers. On the other hand the ROBPCA 
plot in Figure 12(b) clearly distinguishes two major 
groups in the data, as well as a smaller group of bad 
leverage points, a few orthogonal outliers, and the 
isolated case 180 in between the two major groups. 
A high-breakdown method such as ROBPCA de- 
tects the smaller group with cases 143-179 as a set 
of outliers. Later, it turned out that the window of 
the detector system had been cleaned before the last 
38 spectra were measured. As a result less X-ray ra- 
diation was absorbed, resulting in higher X-ray in- 
tensities. The other bad leverage points (57-63) and 
(74-76) are samples with a large concentration of 
calcic. The orthogonal outliers (22, 23 and 30) are 
borderline cases, although it turned out that they 
have larger measurements at the channels 215-245. 
This might indicate a larger concentration of phos- 
phorus. 

7. PRINCIPAL COMPONENT REGRESSION 

Principal component regression is typically used 
for linear regression models (10) or (17) where the 
number of independent variables p is very large or 
where the regressors are highly correlated (this is 
known as multicollinearity) . An important applica- 
tion of PCR is multivariate calibration in chemo- 
metrics, which predicts constituent concentrations 
of a material based on its spectrum. This spectrum 
can be obtained via several techniques such as flu- 
orescence spectrometry, near-infrared spectrometry 
(NIR), nuclear magnetic resonance (NMR), ultravi- 
olet spectrometry (UV), energy dispersive X-ray flu- 
orescence spectrometry (ED-XRF), and so on. Since 
a spectrum typically ranges over a large number of 
wavelengths, it is a high-dimensional vector with 
hundreds of components. The number of concentra- 
tions, on the other hand, is usually limited to at 
most, say, five. In the univariate approach, only one 
concentration at a time is modeled and analyzed. 
The more general problem assumes that the number 
of response variables q is larger than 1, which means 
that several concentrations are to be estimated to- 
gether. This model has the advantage that the co- 
variance structure between the concentrations is also 



ROBUST MULTIVARIATE STATISTICS 



19 



UH 

iteu 



i 



1 

m 

m 


1 ■ P r ' 




t 





56, M 

ST 

■ 


• 





Fig. 12. PCj4 outlier map of the glass dataset based on three principal components, computed with ( a,) CPCA; (h) ROBPCA. 
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Fig. 13. Robust R-RMSECVu curve for the Biscuit Dough dataset. 



taken into account, which is appropriate when the 
concentrations are known to be strongly correlated 
with each other. 

Classical PGR (CPCR) starts by replacing the 
large number of explanatory variables Xj by a small 
number of loading vectors, which correspond to the 
first (classical) principal components of X„^p. Then 
the response variables Yj are regressed on these com- 
ponents using least squares regression. It is thus 
a two-step procedure, which starts by computing 



scores tj for every data point. Then the yj are re- 
gressed on the tj. 

The robust PGR method proposed by Hubert and 
Verboven (2003) combines robust PGA for high-di- 
mensional x-data with a robust multivariate regres- 
sion technique such as MGD regression described 
in Section 4. The robust scores tj obtained with 
ROBPGA thus serve as the explanatory variables 
in the regression model (10) or (17). 

The RPGR method inherits the y-affine equivari- 
ance [the second equation in (19)] from the MGD re- 
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gression method. RPCR is also x-translation equiv- 
ariant and x-orthogonally equivariant, that is, the 
estimates satisfy the third equation in (19) for any 
orthogonal matrix A. These properties follow in a 
straightforward way from the orthogonal equivari- 
ance of the ROBPCA method. Robust PGR meth- 
ods which are based on nonequivariant PC A estima- 
tors, such as those proposed in Pell (2000), are not 
x-equivariant. 

An important issue in PGR is selecting the num- 
ber of principal components, for which several meth- 
ods have been proposed. A popular approach mini- 
mizes the root mean squared error of cross-validation 
criterion RMSEGVfc which, for one response vari- 
able (g = 1), equals 



(25) 



RMSEGVt 



\ 



1 



y-i,kY 



with y-i,k the predicted value for observation where 
i was left out of the dataset when performing the 
PGR method with k principal components. The goal 
of the RMSEGVfc statistic is twofold. It yields an 
estimate of the root mean squared prediction error 
E{y — y)"^ when k components are used in the model, 
whereas the curve of RMSEGVfc for k = 1, . . . , A;max 
is a popular graphical tool to choose the optimal 
number of components. 

This RMSEGVfc statistic is, however, not suited 
at contaminated datasets because it also includes 
the prediction error of the outliers in (25). There- 
fore Hubert and Verboven (2003) proposed a ro- 
bust RMSEGV measure. These R-RMSEGVfc val- 
ues were rather time consuming, because for every 
choice of k they required the whole RPGR proce- 
dure to be performed n times. Faster algorithms for 
cross-validation have recently been developed (En- 
gelen and Hubert, 2005). They avoid the complete 
recomputation of resampling methods such as the 
MGD when one observation is removed from the 
dataset. 

To illustrate RPGR we analyze the Biscuit Dough 
dataset of Osborne et al. (1984), preprocessed as 
in Hubert, Rousseeuw and Verboven (2002). This 
dataset consists of 40 NIR spectra of biscuit dough 
with measurements every 2 nanometers, from 1200 
nm up to 2400 nm. The responses are the percent- 
ages of four constituents in the biscuit dough: yi = 
fat, 2/2 = flour, 7/3 = sucrose and 7/4 = water. Because 
there is a significant correlation among the responses, 
a multivariate regression is performed. The robust 



R-RMSEGVfc curve is plotted in Figure 13 and sug- 
gests to select k = 2 components. 

Differences between GPGR and RPGR show up 
in the loading vectors and in the calibration vec- 
tors. Figure 14 shows the second loading vector and 
the second calibration vector for 7/3 (sucrose) . For in- 
stance, GPGR and RPGR give quite different results 
between wavelengths 1390 and 1440 (the so-called 
G-H bend). 

Next, we can construct outlier maps as in Sec- 
tions 4 and 6.3. ROBPGA yields the PGA outlier 
map displayed in Figure 15(a). We see that there 
are no leverage points but there are some orthogonal 
outliers, the largest being 23, 7 and 20. The result 
of the regression step is shown in Figure 15(b). It 
plots the robust distances of the residuals (or the 
standardized residuals if g = 1) versus the score dis- 
tances. RPGR shows that observation 21 has an ex- 
tremely high residual distance. Other vertical out- 
liers are 23, 7, 20 and 24, whereas there few 
borderline cases. 

8. PARTIAL LEAST SQUARES REGRESSION 

Partial least squares regression (PLSR) is similar 
to PGR. Its goal is to estimate regression coefficients 
in a linear model with a large number of x-variables 
which are highly correlated. In the first step of PGR, 
the scores were obtained by extracting the main 
information present in the rr-variables by perform- 
ing a principal component analysis on them, with- 
out using any information about the ?/- variables. In 
contrast, the PLSR scores are computed by maxi- 
mizing a covariance criterion between the x- and y- 
variables. Hence, this technique uses the responses 
already from the start. 

More precisely, let X„^p and Y„,^q denote the mean- 
centered data matrices, with Xj = Xj — x and yi = 
Yi — y. The normalized PLS weight vectors Va and 



qa (with ||ra|| = ||qa|| = 
vectors that maximize 



1) are then defined as the 



(26) cov(Yqa, Xra) = q^ 



Y'X 



n ■ 



X'Y 



IS 



for each a = 1, . . . ,k, where 'Sy^ = "tl^y — 
the empirical cross-covariance matrix between the 
X- and the 7/- variables. The elements of the scores tj 
are then defined as linear combinations of the mean- 
centered data: iia = x^r^, or equivalently Tn^k = 
X„,pRp,fc with Rp^k = (ri, . . . ,rfc). 
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(a) 



Fig. 14. Second loading vector and calibration vector of sucrose for the Biscuit Dough dataset, computed with (a.) CPCR; 
(h) RPCR. 



•M ' 
•J ■ 



RPCR 



* 1 



*2C 



« e 



1 I » 1 

Swi« UHtaiM* 41 ivy 




«4 I 1 1 t 



(b) 



Fig. 15. (a.) PC A outlier map when 



RPCR to the Biscuit Dough dataset; (h ) corresponding regression outlier map. 



The computation of the PLS weight vectors can 
be performed using the SIMPLS algorithm (de Jong, 
1993), which is described in Appendix A. 4. 

Hubert and Vanden Branden (2003) developed 
the robust method RSIMPLS. It starts by applying 
ROBPCA on the x- and y-variables in order to re- 
place "^xy and by robust estimates, and then pro- 
ceeds analogously to the SIMPLS algorithm. Simi- 
larly to RPCR, a robust regression method (ROBPC. 
regression) is performed in the second stage. Van- 



den Branden and Hubert (2004) proved that for 
low-dimensional data the RSIMPLS approach yields 
bounded influence functions for the weight vectors 
Ta and (\a and for the regression estimates. Also the 
breakdown value is inherited from the MCD estima- 
tor. 

The robustness of RSIMPLS is illustrated on the 
octane dataset (Esbensen, Schonkopf and Midtgaard, 
1994), consisting of NIR absorbance spectra over 
p = 226 wavelengths ranging from 1102 nm to 1552 
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Fig. 16. (a.) Score outlier map of the octane dataset using the SIMPLS results; (h) based on RSIMPLS; (c) regression 
outlier map based on SIMPLS; (A) based on RSIMPLS. 



nm with measurements every two nanometers. For 
each of the n = 39 production gasoUne samples the 
octane number y was measured, so g = 1. It is known 
that the octane dataset contains six outhers (25, 26, 
36-39) to which alcohol was added. From the RM- 
SECV values (Engelen et al., 2004) it follows that 
k = 2 components should be retained. 

The SIMPLS outlier map is Figure 16(a). We see 
that the classical analysis only detects the outlying 
spectrum 26, which does not even stick out much 
above the border line. The robust score outlier map 



is displayed in Figure 16(b). Here we immediately 
spot the six samples with added alcohol. The ro- 
bust regression outlier map in Figure 16(d) shows 
that the outliers are good leverage points, whereas 
SIMPLS again only reveals spectrum 26. 

Note that canonical correlation analysis tries to 
maximize the correlation between linear combina- 
tions of the X- and the y-variables, instead of the co- 
variance in (26). Robust methods for canonical cor- 
relation are presented in Croux and Dehon (2002). 



ROBUST MULTIVARIATE STATISTICS 



23 



9. SOME OTHER MULTIVARIATE 
FRAMEWORKS 

Apart from the frameworks covered in the previ- 
ous sections, there is also work in other multivari- 
ate settings. These methods cannot be described 
in detail here due to lack of space, but here are 
some pointers to the literature. In the framework 
of multivariate location and scatter, an MCD-based 
alternative to the Hotelling test was provided by 
Willems et al. (2002) and a technique based on ro- 
bust distances was applied to the control of elec- 
trical power systems in Mill et al. (1996). High- 
breakdown regression techniques were extended to 
computer vision settings (e.g., Meer et al., 1991; 
Stewart, 1995). For generalized linear models, ro- 
bust approaches have been proposed by Cantoni and 
Ronchetti (2001), Kiinsch, Stefanski and Carroll 
(1989), Markatou, Basu and Lindsay (1998), Miiller 
and Neykov (2003) and Rousseeuw and Christmann 

(2003) . A high-breakdown method for mixed linear 
models has been proposed by Copt and Victoria- 
Feser (2006). Robust nonlinear regression methods 
have been studied by Stromberg (1993), Stromberg 
and Ruppert (1992) and Mizera (2002), who con- 
sidered a depth-based approach. Boente, Pires and 
Rodrigues (2002) introduced robust estimators for 
common principal components. Robust methods were 
proposed for factor analysis (Pison et al., 2003) and 
independent component analysis (Brys, Hubert and 
Rousseeuw, 2005). Croux et al. (2003) fitted gen- 
eral multiplicative models such as FANOVA. Ro- 
bust clustering methods have been investigated by 
Kaufman and Rousseeuw (1990), Cuesta-Albertos, 
Gordaliza and Matran (1997) and Hardin and Rocke 

(2004) . Robustness in time series analysis and econo- 
metrics has been studied by Martin and Yohai (1986), 
Bustos and Yohai (1986), Muler and Yohai (2002), 
Franses, Kloek and Lucas (1999), van Dijk, Franses 
and Lucas (1999a, 1999b) and Lucas and Franses 
(1998). Of course, this short list is far from com- 
plete. 

10. AVAILABILITY 

Stand-alone programs carrying out FAST-MCD 
and FAST-LTS can be downloaded from the web- 
site http : //www . agoras . ua . ac . be, as well as Mat- 
lab versions. The FAST-MCD algorithm is avail- 
able in the package S-PLUS (as the built-in func- 
tion cov.mcd), in R (as part of the packages rrcov, 
robust and robustbase), and in SAS/IML Version 7. 



It is also included in SAS Version 9 (in PROC RO- 
BUSTREG). These packages all provide the one- 
step weighted MCD estimates. The LTS is available 
in S-PLUS as the built-in function Itsreg, which uses 
a slower algorithm and has a low default breakdown 
value. The FAST-LTS algorithm is available in R (as 
part of rrcov and robustbase) and in SAS/IML Ver- 
sion 7. In SAS Version 9 it is incorporated in PROC 
ROBUSTREG. 

Matlab functions for most of the procedures men- 
tioned in this paper (MCD, LTS, MCD-regression, 
RQDA, ROBPCA, RPCR and RSIMPLS) are part 
of LIBRA, a Matlab LIBrary for Robust Analysis 
(Verboven and Hubert, 2005) which can be down- 
loaded from http : //wis .kuleuven.be/stat/robust. 
Several of these functions are also available in the 
PLS toolbox of Eigenvector Research 
(www . eigenvector . com). 

APPENDIX 

A.l The FAST-MCD Algorithm 

Rousseeuw and Van Driessen (1999) developed the 
FAST-MCD algorithm to efficiently compute the MCD. 
The key component is the C-step: 

Theorem. Take X = {xi, . . . ,x„} and let Hi C 
{1, . . . ,n} he an h-suhset, that is, \Hi \ = h. Put (ii : = 
iEiGHiXi and El := i^ig^^(xi-/xi)(xi-/xi)'. // 
det(5]i) 7^0, define the relative distances 

di{i) := \J (xj - - fii) 

for i = 1, . . . ,n. 

Now take H2 such that {di{i)]i € H2} := {(^1)1:^ • • • , 
{di)h:n] where (di)i;n < ((ii)2:n < ••• < {di)n;n are 
the ordered distances, and compute fi2 and S2 based 
on H2. Then 

det(l;2) < det(i:i) 

with equality if and only if p,2 = (i-i and S2 = Si. 

If det(Si) > 0, the C-step yields % with det(S2) < 
det(Si). Note that the C stands for "concentra- 
tion" since S2 is more concentrated (has a lower 
determinant) than Si. The condition det(Si) 7^ 
in the C-step theorem is no real restriction because 
if det(Si) = we already have the minimal objec- 
tive value. 

In the algorithm the C-step works as follows. Given 
(Aold'^old): 



24 



M. HUBERT, P. J. ROUSSEEUW AND S. VAN AELST 



1. compute the distances c?oid(«) for i = 1, . . . , n 

2. sort these distances, which yields a permutation 
TT for which (ioid('^(l)) ^ '^oid('''"(2)) < ■•■ < 
dold(vr(n)) 

3. put i/ncw:={vr(l),7r(2),...,7r(/i)} 

4. compute fi^^^ := ave(i?ncw) and Snew : = 

COv(i?new)- 

For a fixed number of dimensions p, the C-step takes 
only 0(n) time [because -ffncw can be determined in 
0(n) operations without fully sorting all the doid(0 
distances] . 

C-steps can be iterated until det(Slnew) = or 
det(SnGw) = det(Soid)- The sequence of determi- 
nants obtained in this way must converge in a fi- 
nite number of steps because there are only finitely 
many /i-subsets. However, there is no guarantee that 
the final value det(Sncw) of the iteration process is 
the global minimum of the MCD objective function. 
Therefore an approximate MCD solution can be ob- 
tained by taking many initial choices of Hi, applying 
C-steps to each and keeping the solution with low- 
est determinant. For more discussion on resampling 
algorithms, see Hawkins and Olive (2002). 

To construct an initial subset Hi , a random (p -|- 
l)-subset J is drawn and /Xg := ave(J) and YIq := 
cov( J) are computed. [If det(llo) = 0, then J can be 
extended by adding observations until det(l]o) > 0.] 
Then, for i = 1, . . . ,n the distances dQ^i) := (xj — 

ftf^ySQ (Xj — /Xq) are computed and sorted into 
do(7r(l)) < • • • < doiT^in)), which leads to Hi := {vr(l), 
. . . ,7r(/i)}. This method yields better initial subsets 
than drawing random /i-subsets directly, because the 
probability of drawing an outlier-free subset is much 
higher when drawing (p -|- l)-subsets than with h- 
subsets. 

The FAST-MCD algorithm contains several com- 
putational improvements. Since each C-step involves 
the calculation of a covariance matrix, its determi- 
nant and the corresponding distances, using fewer 
C-steps considerably improves the speed of the al- 
gorithm. It turns out that after two C-steps, many 
runs that will lead to the global minimum already 
have a considerably smaller determinant. Therefore, 
the number of C-steps is reduced by applying only 
two C-steps on each initial subset and selecting the 
10 different subsets with lowest determinants. Only 
for these 10 subsets, further C-steps are taken until 
convergence. 

This procedure is very fast for small sample sizes 
n, but when n grows the computation time increases 



due to the n distances that need to be calculated in 
each C-step. For large n FAST-MCD uses a parti- 
tioning of the dataset, which avoids doing all the 
calculations in the entire data. In any case, let /x^p^ 

and Sopt denote the mean and covariance matrix 
of the /i-subset with lowest covariance determinant. 
Then the algorithm returns 

Amcd = Aopt and Smcd = Ch,„Sopt, 

where Ch^n is the product of a consistency factor and 
a finite-sample correction factor (Pison, Van Aelst 
and Willems, 2002). Note that the FAST-MCD al- 
gorithm is itself affine equivariant. 

A.2 The FAST-LTS Algorithm 

The basic component of the LTS algorithm is again 
the C-step, which now says that starting from an ini- 
tial /i-subset Hi or an initial fit 6i , we can construct 
a new /i-subset H2 by taking the h observations with 
smallest absolute residuals |rj(0i)|. Applying LS to 
H2 then yields a new fit 62 which is guaranteed to 
have a lower objective function (11). 

To construct the initial /i-subsets the algorithm 
starts from randomly drawn {p+ l)-subsets. For each 
{p + l)-subset the coefficients 6q of the hyperplane 
through the points in the subset are calculated. [If a 
{p + l)-subset does not define a unique hyperplane, 
then it is extended by adding more observations un- 
til it does.] The corresponding initial /i-subset is 
then formed by the h points closest to the hyper- 
plane (i.e., with smallest residuals). As was the case 
for the MCD, also here this approach yields much 
better initial fits than would be the case if random 
/i-subsets were drawn directly. 

Let 0opt denote the least squares fit of the optimal 
/i-subset found by the whole resampling procedure; 
then FAST-LTS returns 



^LTS = ^opt 



and 



CLTS = Ch, 



A. 3 The ROBPCA Algorithm 

First, the data are preprocessed by reducing their 
data space to the subspace spanned by the n obser- 
vations. This is done by a singular value decompo- 
sition of X„^p. As a result, the data are represented 
using at most n — 1 = rank(X„_p) variables without 
loss of information. 
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In the second step of the ROBPCA algorithm, a 
measure of outlyingness is computed for each data 
point. This is obtained by projecting the high-dimen- 
sional data points on many univariate directions. 
On every direction the univariate MOD estimator 
of location and scale is computed, and for every 
data point its standardized distance to that center 
is measured. Finally for each data point its largest 
distance over all the directions is considered. The 
h data points with smallest outlyingness are kept, 
and from the covariance matrix S/j of this /i-subset 
we select the number k of principal components to 
retain. 

The last stage of ROBPCA consists of project- 
ing the data points onto the fc-dimensional subspace 
spanned by the largest eigenvectors of S/j and of 
computing their center and shape using the weighted 
MOD estimator. The eigenvectors of this scatter ma- 
trix then determine the robust principal components, 
and the location estimate serves as a robust center. 

A.4 The SIMPLS Algorithm 

The solution of the maximization problem (26) 
is found by taking ri and qi as the first left and 
right singular eigenvectors of Xlxy- The other PLSR 
weight vectors and qa for a = 2, . . . ,k are ob- 
tained by imposing an orthogonality constraint to 
the elements of the scores. If we require that 
J27=i tiatib = for a 7^ 6, a deflation of the cross- 
covariance matrix "S^y provides the solutions for 
the other PLSR weight vectors. This deflation is 
carried out by first calculating the x-loading = 
Y]xTCa/{Ti''a^x^a) with Sj; the empirical variance-co- 
variance matrix of the x-variables. Next an orthonor- 
mal base {vi, . . . , Vq} of {pi, . . . ,pa} is constructed 
and Tixy is deflated as 

^xy = ^xy ~ ^ai^a^xy ) 

with 'S^y = 'Sxy In general the PLSR weight vectors 
Fa and qa are obtained as the left and right singular 
vectors of S^.^. 

ACKNOWLEDGMENTS 

We would like to thank Sanne Engelen, Karlien 
Vanden Branden and Sabine Verboven for help with 
preparing the figures of this paper. 

REFERENCES 

Adrover, J. and Zamar, R. H. (2004). Bias robustness of 
three median-based regression estimates. J. Statist. Plann. 
Inference 122 203-227. MR2057923 



Agullo, J., Croux, C. and Van Aelst, S. (2006). The mul- 
tivariate least trimmed squares estimator. J. Multivariate 
Anal. To appear. 

Alqallaf, F. a., Konis, K. P., Martin, R. D. and Za- 
mar, R. H. (2002). Scalable robust covariance and corre- 
lation estimates for data mining. In Proceedings of the Sev- 
enth ACM SIGKDD International Conference on Knowl- 
edge Discovery and Data Mining. Edmonton. 

Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, 
P. J., Rogers, W. H. and Tukey, J. W. (1972). Ro- 
bust Estimates of Location: Survey and Advances. Prince- 
ton Univ. Press. MR0331595 

Bai, Z. D. and He, X. (2000). Asymptotic distributions of the 
maximal depth estimators for regression and multivariate 
location. Ann. Statist. 27 1616-1637. MR1742502 

Becker, C. and Gather, U. (1999). The masking break- 
down point of multivariate outlier identification rules. J. 
Amer. Statist. Assoc. 94 947-955. MR1723295 

Berrendero, J. R. and Zamar, R. H. (2001). Maximum 
bias curves for robust regression with non-elliptical regres- 
sors. Ann. Statist. 29 224-251. MR1833964 

BOENTE, G., PiRES, A. M. and Rodrigues, I. (2002). In- 
fiuence functions and outlier detection under the com- 
mon principal components model: A robust approach. 
Biometrika 89 861-875. MR1946516 

Box, G. E. P. (1954). Some theorems on quadratic forms 
applied in the study of analysis of variance problems. I. 
Eff'ect of inequality of variance in one-way classification. 
Ann. Math. Statist. 25 290-302. MR0061787 

Brys, G., Hubert, M. and Rousseeuw, P. J. (2005). 
A robustification of independent component analysis. J. 
Chemometrics 19 364-375. 

BuSTOS, O. H. and YOHAI, V. J. (1986). Robust estimates 
for ARMA models. J. Amer. Statist. Assoc. 81 155-168. 
MR0830576 

Butler, R. W., Davies, P. L. and Jhun, M. (1993). Asymp- 

totics for the Minimum Covariance Determinant estimator. 

Ann. Statist. 21 1385-1400. MR1241271 
Campbell, N. A. (1980). Robust procedures in multivariate 

analysis I: Robust covariance estimation. Appl. Statist. 29 

231-237. 

Cantoni, E. and RONCHETTI, E. (2001). Robust inference 
for generalized linear models. J. Amer. Statist. Assoc. 96 
1022-1030. MR1947250 

Chen, T.-C. and Victoria-Feser, M. (2002). High break- 
down estimation of multivariate location and scale with 
missing observations. British J. Math. Statist. Psych. 55 
317-335. MR1949260 

COAKLEY, C. W. and Hettmansperger, T. p. (1993). 
A bounded infiuence, high breakdown, efficient regression 
estimator. J. Amer. Statist. Assoc. 88 872-880. MR1242938 

Copt, S. and Victoria-Feser, M.-P. (2006). High break- 
down inference in the mixed linear model. J. Amer. Statist. 
Assoc. 101 292-300. MR2268046 

Croux, C. and Dehon, C. (2001). Robust linear discriminant 
analysis using S-estimators. Canad. J. Statist. 29 473-493. 
MR1872648 

Croux, C. and Dehon, C. (2002). Analyse canonique basee 
sur des estimateurs robustes de la matrice de covariance. 
La Revue de Statistique Appliquee 2 5-26. 



26 



M. HUBERT, P. J. ROUSSEEUW AND S. VAN AELST 



Croux, C. and Demon, C. (2003). Estimators of the multi- 
ple correlation coefficient: Local robustness and confidence 
intervals. Statist. Papers 44 315-334. MR1996955 

Croux, C, Filzmoser, P., Pison, P. and Rousseeuw, 
P. J. (2003). Fitting multiplicative models by robust alter- 
nating regressions. Statist. Comput. 13 23-36. MR1973864 

Croux, C. and Haesbroeck, G. (1999). Influence function 
and efficiency of the Minimum Covariance Determinant 
scatter matrix estimator. J. Multivariate Anal. 71 161-190. 
MR1735108 

Croux, C. and Haesbroeck, G. (2000). Principal compo- 
nents analysis based on robust estimators of the covariance 
or correlation matrix: Influence functions and efficiencies. 
Biometrika 87 603-618. MR1789812 

Croux, C, Rousseeuw, P. J. andHossjER, O. (1994). Gen- 
eralized S-estimators. J. Amer. Statist. Assoc. 89 1271- 
1281. MR1310221 

Croux, C. and Ruiz-Gazen, A. (1996). A fast algorithm for 
robust principal components based on projection pursuit. 
In COMPSTAT 1996 211-217. Physica, Heidelberg. 

Croux, C. and Ruiz-Gazen, A. (2005). High breakdown es- 
timators for principal components: The projection-pursuit 
approach revisited. J. Multivariate Anal. 95 206-226. 
MR2164129 

Cuesta-Albertos, J. A., Gordaliza, A. and Matran, C. 
(1997). Trimmed k-means: An attempt to robustify quan- 
tizers. Ann. Statist. 25 553-576. MR1439314 

Cui, H., He, X. and No, K. W. (2003). Asymptotic distribu- 
tions of principal components based on robust dispersions. 
Biometrika 90 953-966. MR2024769 

Davies, L. (1987). Asymptotic behavior of S-estimators of 
multivariate location parameters and dispersion matrices. 
Ann. Statist. 15 1269-1292. MR0902258 

Davies, L. (1992a). The asymptotics of Rousseeuw's min- 
imum volume ellipsoid estimator. Ann. Statist. 20 1828- 
1843. MR1193314 

Davies, L. (1992b). An efficient Frechet differentiable high 
breakdown multivariate location and dispersion estimator. 
J. Multivariate Anal. 40 311-327. MR1150615 

Davies, P. L. and Gather, U. (2005). Breakdown and 
groups. Ann. Statist. 33 977-1035. MR2195626 

DE Jong, S. (1993). SIMPLS: an alternative approach to par- 
tial least squares regression. Chemometrics and Intelligent 
Laboratory Systems 18 251-263. 

Donoho, D. L. (1982). Breakdown properties of multivari- 
ate location estimators. Qualifying paper. Harvard Univ., 
Boston. 

Donoho, D. L. and Gasko, M. (1992). Breakdown prop- 
erties of location estimates based on halfspace depth 
and projected outlyingness. Ann. Statist. 20 1803-1827. 
MR1193313 

Donoho, D. L. and Huber, P. J. (1983). The notion of 
breakdown point. In A Festschrift for Erich Lehmann 
(P. Bickel, K. Doksum and J. Hodges, eds.) 157-184. 
Wadsworth, Belmont, CA. MR0689745 

Engelen, S. and Hubert, M. (2005). Fast model selection 
for robust calibration methods. Analytica Chimica Acta 
544 219-228. 

Engelen, S., Hubert, M., Vanden Branden, K. and Ver- 
BOVEN, S. (2004). Robust PGR and robust PLS: A compar- 



ative study. In Theory and Applications of Recent Robust 
Methods (M. Hubert, G. Pison, A. Struyf and S. V. Aelst, 
eds.) 105-117. Birkhauser, Basel. MR2085889 

ESBENSEN, K., SCHONKOPF, S. and MiDTGAARD, T. (1994). 

Multivariate Analysis in Practice. Camo, Trondheim. 
Ferretti, N., Kelmansky, D., Yohai, V. J. and Zamar, 
R. H. (1999). A class of locally and globally robust re- 
gression estimates. J. Amer. Statist. Assoc. 94 174-188. 
MR1689223 

Franses, p. H., Kloek, T. and Lucas, A. (1999). Out- 
lier robust analysis of long-run marketing effects for weekly 
scanning data. J. Econometrics 89 293-315. 

Friedman, J. H. and Tukey, J. W. (1974). A projec- 
tion pursuit algorithm for exploratory data analysis. IEEE 
Transactions on Computers C 23 881-889. 

Garcia Ben, M., Martinez, E. and Yohai, V. J. (2006). 
Robust estimation for the multivariate linear model based 
on a r-scale. J. Multivariate Anal. 97 1600-1622. 

Hampel, F. R. (1975). Beyond location parameters: Robust 
concepts and methods. Bull. Internat. Statist. Inst. 46 375- 
382. MR0483172 

Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. 
and Stahel, W. A. (1986). Robust Statistics: The Ap- 
proach Based on Influence Functions. Wiley, New York. 
MR0829458 

Hardin, J. and Rocke, D. M. (2004). Outlier detection 
in the multiple cluster setting using the minimum covari- 
ance determinanr estimator. Comput. Statist. Data Anal. 
44 625-638. MR2026436 

Hawkins, D. M. and McLachlan, G. J. (1997). High- 
breakdown linear discriminant analysis. J. Amer. Statist. 
Assoc. 92 136-143. MR1436102 

Hawkins, D. M. and Olive, D. (2002). Inconsistency of re- 
sampling algorithms for high breakdown regression esti- 
mators and a new algorithm (with discussion). J. Amer. 
Statist. Assoc. 97 136-159. MR1947276 

He, X. and Fung, W. K. (2000). High breakdown estimation 
for multiple populations with applications to discriminant 
analysis. J. Multivariate Anal. 72 151-162. MR1740638 

He, X. and Simpson, D. G. (1993). Lower bounds for con- 
tamination bias: Globally minimax versus locally linear es- 
timation. Ann. Statist. 21 314-337. MR1212179 

He, X., Simpson, D. G. and Wang, G. (2000). Breakdown 
points of t-type regression estimators. Biometrika 87 675- 
687. MR1789817 

Hossjer, O. (1994). Rank-based estimates in the linear 
model with high breakdown point. J. Amer. Statist. As- 
soc. 89 149-158. MR1266292 

Huber, P. J. (1973). Robust regression: Asymptotics, con- 
jectures and Monte Carlo. Ann. Statist. 1 799-821. 
MR0356373 

Huber, P. J. (1981). Robust Statistics. Wiley, New York. 
MR0606374 

Huber, P.J. (1985). Projection pursuit. Ann. Statist. 13 435- 
525. MR0790553 

Hubert, M. and Engelen, S. (2004). Robust PCA and clas- 
sification in biosciences. Bioinformatics 20 1728-1736. 

Hubert, M. and Rousseeuw, P. J. (1996). Robust regres- 
sion with both continuous and binary regressors. J. Statist. 
Plann. Infer. 57 153-163. 



ROBUST MULTIVARIATE STATISTICS 



27 



Hubert, M., Rousseeuw, P. J. and Vanden Branden, 
K. (2005). ROBPCA: A new approach to robust principal 
components analysis. Technometrics 47 64-79. MR2135793 

Hubert, M., Rousseeuw, P. J. and Verboven, S. (2002). 
A fast robust method for principal components with ap- 
plications to chemometrics. Chemometrics and Intelligent 
Laboratory Systems 60 101-111. 

Hubert, M. and Van Driessen, K. (2004). Fast and robust 
discriminant analysis. Comput. Statist. Data Anal. 45 301- 
320. MR2045634 

Hubert, M. and Vanden Branden, K. (2003). Robust 
methods for Partial Least Squares Regression. J. Chemo- 
metrics 17 537-549. 

Hubert, M. and Verboven, S. (2003). A robust PGR 
method for high-dimensional regressors. J. Chemometrics 
17 438-452. 

Johnson, R. A. and Wichern, D. W. (1998). Applied Multi- 
variate Statistical Analysis. Prentice Hall Inc., Englewood 
Cliffs, NJ. MR1168210 

Jureckova, ,J. (1971). Nonparametric estimate of regression 
coefficients. Ann. Math. Statist. 42 1328-1338. MR0295487 

Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups 
in Data. Wiley, New York. MR1044997 

Kent, J. T. and Tyler, D. E. (1996). Constrained M- 
estimation for multivariate location and scatter. Ann. 
Statist. 24 1346-1370. MR1401854 

Koenker, R. and Portnoy, S. (1987). L-estimation for 
linear models. J. Amer. Statist. Assoc. 82 851-857. 
MR0909992 

Krasker, W. S. and Welsch, R. E. (1982). EflScient 
bounded- influence regression estimation. J. Amer. Statist. 
Assoc. 77 595-604. MR0675886 

Kunsch, H. R., Stefanski, L. A. and Carroll, R. J. 
(1989). Conditionally unbiased bounded influence estima- 
tion in general regression models with applications to gen- 
eralized linear models. J. Amer. Statist. Assoc. 84 460-466. 
MR1010334 

Lemberge, p., De Raedt, I., Janssens, K. H., Wei, F. 
and Van Espen, P. J. (2000). Quantitative Z-analysis of 
16th-17th century archaelogical glass vessels using PLS re- 
gression of EPXMA and ^-XRF data. J. Chemometrics 14 
751-763. 

Li, G. and Chen, Z. (1985). Projection-pursuit approach to 
robust dispersion matrices and principal components: Pri- 
mary theory and Monte Carlo. J. Amer. Statist. Assoc. 80 
759-766. 

Liu, R. Y., Parelius, ,J. M. and Singh, K. (1999). Multivari- 
ate analysis by data depth: Descriptive statistics, graphics 
and inference. Ann. Statist. 27 783-840. MR1724033 

LocANTORE, N., Marron, J. S., SiMPSON, D. G., Tripoli, 
N., Zhang, J. T. and Cohen, K. L. (1999). Robust prin- 
cipal component analysis for functional data. Test 8 1-73. 
MR1707596 

LOPUHAA, H. P. (1989). On the relation between S-estimators 
and M-estimators of multivariate location and covariance. 
Ann. Statist. 17 1662-1683. MR1026304 

LoPUHAA, H. p. (1991). Multivariate r-estimators for loca- 
tion and scatter. Canad. J. Statist. 19 307-321. MR1144148 



LoPUHAA, H. P. (1999). Asymptotics of reweighted estima- 
tors of multivariate location and scatter. Ann. Statist. 27 
1638-1665. MR1742503 

LoPUHAA, H. p. and Rousseeuw, P. J. (1991). Breakdown 
points of afflne equivariant estimators of multivariate lo- 
cation and covariance matrices. Ann. Statist. 19 229-248. 
MR1091847 

Lucas, A. and Franses, P. H. (1998). Outlier detection in 
cointegration analysis. J. Bus. Econom. Statist. 16 459- 
468. 

Markatou, M., Basu, a. and Lindsay, B. G. (1998). 
Weighted likelihood equations with bootstrap root search. 
J. Amer. Statist. Assoc. 93 740-750. MR1631378 

Markatou, M. and He, X. (1994). Bounded influence and 
high breakdown point testing procedures in linear models. 
J. Amer. Statist. Assoc. 89 543-549. MR1294081 

Markatou, M. and Hettmansperger, T. P. (1990). Ro- 
bust bounded-influence tests in linear models. J. Amer. 
Statist. Assoc. 85 187-190. MR1137365 

Maronna, R. a. (1976). Robust M-estimators of multivari- 
ate location and scatter. Ann. Statist. 4 51-67. MR0388656 

Maronna, R. A. (2005). Principal components and orthog- 
onal regression based on robust scales. Technometrics 47 
264-273. MR2164700 

Maronna, R. A., Bustos, O. and Yohai, V. J. (1979). 
Bias- and efficiency-robustness of general M-estimators for 
regression with random carriers. Smoothing Techniques for 
Curve Estimation (T. Gasser and M. Rosenblatt, eds.). 
Lecture Notes in Math. 757 91-116. Springer, New York. 
MR0564254 

Maronna, R. A., Martin, D. R. and Yohai, V. J. (2006). 
Robust Statistics: Theory and Methods. Wiley, New York. 
MR2238141 

Maronna, R. A., Stahel, W. A. and Yohai, V. ,J. (1992). 
Bias-robust estimators of multivariate scatter based on pro- 
jections. J. Multivar. Anal. 42 141-161. MR1177523 

Maronna, R. A. and Yohai, V. J. (1981). Asymptotic be- 
havior of general M-estimators for regression and scale 
with random carriers. Z. Wahrsch. Verw. Cebiete 58 7-20. 
MR0635268 

Maronna, R. A. and Yohai, V. J. (1993). Bias-robust esti- 
mates of regression based on projections. Ann. Statist. 21 
965-990. MR1232528 

Maronna, R. A. and Yohai, V. ,J. (1995). The behavior of 
the Stahel-Donoho robust multivariate estimator. J. Amer. 
Statist. Assoc. 90 330-341. MR1325140 

Maronna, R. A. and Yohai, V. J. (2000). Robust regression 
with both continuous and categorical predictors. J. Statist. 
Plann. Infer. 89 197-214. MR1794422 

Maronna, R. A. and Zamar, R. H. (2002). Robust multi- 
variate estimates for high dimensional data sets. Techno- 
metrics 44 307-317. MR1939680 

Martin, R. D. and Yohai, V. .J. (1986). Influence function- 
als for time series. Ann. Statist. 14 781-818. MR0856793 

Martin, R. D., Yohai, V. J. and Zamar, R. H. (1989). Min- 
max bias robust regression. Ann. Statist. 17 1608-1630. 
MR1026302 

Meer, p., Mintz, D., Rosenfeld, A. and Kim, D. Y. 
(1991). Robust regression methods in computer vision: A 
review. Internat. J. Comput. Vision 6 59-70. 



28 



M. HUBERT, P. J. ROUSSEEUW AND S. VAN AELST 



Mendes, B. and Tyler, D. E. (1996). Constrained M- 
estimates for regression. Robust Statistics: Data Analysis 
and Computer Intensive Methods (H. Rieder, ed.). Lec- 
ture Notes m Statist. 109 299-320. Springer, New York. 
MR1491412 

MiLi, L., Cheniae, M. G., Vichare, N. S. and Rousseeuw, 
P. J. (1996). Robust state estimation based on projection 
statistics. IEEE Transactions on Power Systems 11 1118- 
1127. 

MiZERA, I. (2002). On depth and deep points: A calculus. 

Ann. Statist. 30 1681-1736. MR1969447 
Model, F., Konig, T., Piepenbrock, C. and Adorjan, P. 

(2002). Statistical process control for large scale microarray 

experiments. Bioinformatics 1 1-9. 
Muler, N. and YOHAI, V. J. (2002). Robust estimates 

for ARCH processes. J. Time Series Anal. 23 341-375. 

MR1908596 

MijLLER, C. H. and Neykov, N. (2003). Breakdown points 
of trimmed likelihood estimators and related estimators in 
generalized linear models. J. Statist. Plann. Infer. 116 503- 
519. MR2000097 

MijLLER, S. and Welsh, A. H. (2006). Outlier robust model 
selection in linear regression. J. Amer. Statist. Assoc. 100 
1297-1310. MR2236443 

Odewahn, S. C, Djorgovski, S. G., Brunner, R. J. and 
Gal, R. (1998). Data from the digitized palomar sky sur- 
vey. Technical report, California Institute of Technology. 

Osborne, B. G., Fearn, T., Miller, A. R. and Douglas, 
S. (1984). Application of near infrared reflectance spec- 
troscopy to the compositional analysis of biscuits and bis- 
cuit dough. J. Scientific Food Agriculture 35 99-105. 

Pell, R. J. (2000). Multiple outlier detection for multivariate 
calibration using robust statistical techniques. Chemomet- 
rics and Intelligent Laboratory Systems 52 87-104. 

PisoN, G., Rousseeuw, P. J., Filzmoser, P. and Croux, 
C. (2003). Robust factor analysis. J. Multivariate Anal. 84 
145-172. MR1965827 

PiSON, G., Van Aelst, S. and Willems, G. (2002). Small 
sample corrections for LTS and MCD. Metrika 55 111-123. 
MR1903287 

ROCKE, D. M. (1996). Robustness properties of S-estimators 
of multivariate location and shape in high dimension. Ann. 
Statist. 24 1327-1345. MR1401853 

RoCKE, D. M. and Woodruff, D. L. (1996). Identification 
of outliers in multivariate data. J. Amer. Statist. Assoc. 91 
1047-1061. MR1424606 

Ronchetti, E., Field, C. and Blanchard, W. (1997). Ro- 
bust linear model selection by cross-validation. J. Amer. 
Statist. Assoc. 92 1017-1023. MR1482132 

Ronchetti, E. and Staudte, R. G. (1994). A robust ver- 
sion of Mallows' Cp. J. Amer. Statist. Assoc. 89 550-559. 
MR1294082 

Rousseeuw, P. J. (1984). Least median of squares regression. 
J. Amer. Statist. Assoc. 79 871-880. MR0770281 

Rousseeuw, P.J. (1985). Multivariate estimation with high 
breakdown point. In Mathematical Statistics and Applica- 
tions, B (W. Grossmann, G. Pflug, I. Vincze and W. Wertz, 
eds.). Reidel Publishing Company, Dordrecht. MR0851060 



Rousseeuw, P. J. and Christmann, A. (2003). Robustness 
against separation and outliers in logistic regression. Com- 
put. Statist. Data Anal. 43 315-332. MR1996815 

Rousseeuw, P. J. and Croux, C. (1993). Alternatives to 
the median absolute deviation. J. Amer. Statist. Assoc. 88 
1273-1283. MR1245360 

Rousseeuw, P. J. and Hubert, M. (1999). Regression 
depth. J. Amer. Statist. Assoc. 94 388-402. MR1702314 

Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regres- 
sion and Outlier Detection. Wiley-Interscience, New York. 
MR0914792 

Rousseeuw, P. J., Ruts, I. and Tukey, J. W. (1999a). 
The bagplot: A bivariate boxplot. American Statistician 
53 382-387. 

Rousseeuw, P. J., Van Aelst, S. and Hubert, M. (1999b). 
Rejoinder to the discussion of "Regression depth." J. 
Amer. Statist. Assoc. 94 388-433. MR1702314 

Rousseeuw, P. J., Van Aelst, S., Van Driessen, K. and 
Agullo, J. (2004). Robust multivariate regression. Tech- 
nometrics 46 293-305. MR2082499 

Rousseeuw, P. J. and Van Driessen, K. (1999). A fast 
algorithm for the minimum covariance determinant esti- 
mator. Technometrics 41 212-223. 

Rousseeuw, P. J. and Van Driessen, K. (2006). Comput- 
ing LTS regression for large data sets. Data Mining and 
Knowledge Discovery 12 29-45. MR2225526 

Rousseeuw, P. J. and van Zomeren, B. C. (1990). Un- 
masking multivariate outliers and leverage points. J. Amer. 
Statist. Assoc. 85 633-651. 

Rousseeuw, P. J. and Yohai, V. J. (1984). Robust regres- 
sion by means of S-estimators. Robust and Nonlinear Time 
Series Analysis (J. Franke, W. Hardle and R. Martin, eds.). 
Lecture Notes m Statist. 26 256-272. Springer, New York. 
MR0786313 

Salibian-Barrera, M., Van Aelst, S. and Willems, G. 
(2006). PC A based on multivariate MM-estimators with 
fast and robust bootstrap. J. Amer. Statist. Assoc. 101 
1198-1211. 

Salibian-Barrera, M. and Yohai, V. J. (2006). A fast 
algorithm for S-regression estimates. J. Comput. Graph. 
Statist. 15 414-427. MR2246273 

Salibian-Barrera, M. and Zamar, R. H. (2002). Boot- 
strapping robust estimates of regression. Ann. Statist. 30 
556-582. MR1902899 

Siegel, a. F. (1982). Robust regression using repeated me- 
dians. Biometrika 69 242-244. 

Simpson, D. G., Ruppert, D. and Carroll, R. J. (1992). 
On one-step GM-estimates and stability of inferences in 
linear regression. J. Amer. Statist. Assoc. 87 439-450. 
MR1173809 

Simpson, D. G. and Yohai, V. J. (1998). Functional stability 
of one-step estimators in approximately linear regression. 
Ann. Statist. 26 1147-1169. MR1635458 

Stahel, W. a. (1981). Robuste Schatzungen: Infinites- 
imale Optimalitat und Schatzungen von Kovarianzma- 
trizen. Ph.D. thesis, ETH Ziirich. 

Stewart, C. V. (1995). MINPRAN: A new robust estima- 
tor for computer vision. IEEE Trans. Pattern Anal. Mach. 
Intelligence 17 925-938. 



ROBUST MULTIVARIATE STATISTICS 



29 



Stromberg, a. J. (1993). Computation of high breakdown 
nonlinear regression. J. Amer. Statist. Assoc. 88 237-244. 

Stromberg, A. J. and Ruppert, D. (1992). Breakdown in 
nonlinear regression. J. Amer. Statist. Assoc. 87 991-997. 
MR1209560 

Tatsuoka, K. S. and Tyler, D. E. (2000). On the unique- 
ness of S-functionals and M-functionals under nonelliptical 
distributions. Ann. Statist. 28 1219-1243. MR1811326 

Tyler, D. E. (1994). Finite-sample breakdown points of 
projection-based multivariate location and scatter statis- 
tics. Ann. Statist. 22 1024-1044. MR1292555 

Van Aelst, S. and Rousseeuw, P. J. (2000). Robustness 
of deepest regression. J. Multivariate Anal. 73 82-106. 
MR1766122 

Van Aelst, S., Rousseeuw, P. J., Hubert, M. and 
Struyf, a. (2002). The deepest regression method. J. 
Multivariate Anal. 81 138-166. MR1901211 

Van Aelst, S. and Willems, G. (2005). Multivariate re- 
gression S-estimators for robust estimation and inference. 
Statist. Sinica 15 981-1001. MR2234409 

VAN DiJK, D., Franses, p. H. and Lucas, A. (1999a). Test- 
ing for ARCH in the presence of additive outliers. J. Appl. 
Econometrics 14 539-562. 

VAN Dlik, D., Franses, P. H. and Lucas, A. (1999b). Test- 
ing for smooth transition nonlinearity in the presence of 
outhers. J. Bus. Econom. Statist. 17 217-235. 

Vanden Branden, K. and Hubert, M. (2004). Robustness 
properties of a robust PLS regression method. Analytica 
Chimica Acta 515 229-241. 

Verboven, S. and Hubert, M. (2005). LIBRA: A Matlab 
library for robust analysis. Chemometrics and Intelligent 



Laboratory Systems 75 127-136. 
Willems, G., Pison, G., Rousseeuw, P. J. and Van Aelst, 
S. (2002). A robust Hotelling test. Metrika 55 125-138. 
MR1903288 

Woodruff, D. L. and Rocke, D. M. (1994). Computable 
robust estimation of multivariate location and shape in 
high dimension using compound estimators. J. Amer. 
Statist. Assoc. 89 888-896. MR1294732 

YOHAI, V. J. (1987). High breakdown point and high ef- 
ficiency robust estimates for regression. Ann. Statist. 15 
642-656. MR0888431 

YOHAI, V. J. and Zamar, R. H. (1988). High breakdown 
point estimates of regression by means of the minimization 
of an efficient scale. J. Amer. Statist. Assoc. 83 406-413. 
MR0971366 

Zamar, R. H. (1989). Robust estimation in the errors in vari- 
ables model. Biometnka 76 149-160. MR0991433 

Zamar, R. H. (1992). Bias robust estimation in orthogonal 
regression. Ann. Statist. 20 1875-1888. MR1193316 

Zuo, Y., Cui, H. and He, X. (2004). On the Stahel-Donoho 
estimator and depth-weighted means of multivariate data. 
Ann. Statist. 32 167-188. MR2051003 

Zuo, Y. and Serfling, R. (2000a). General notions of statis- 
tical depth function. Ann. Statist. 28 461-482. MR1790005 

Zuo, Y. and Serfling, R. (2000b). Nonparametric notions of 
multivariate "scatter measure" and "more scattered" based 
on statistical depth functions. J. Multivariate Anal. 75 62- 
78. MR1787402 



